Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

ABC-HuMi: the Atlas of Biosynthetic Gene Clusters in the Human Microbiome.

Hirsch, Pascal; Tagirdzhanov, Azat; Kushnareva, Aleksandra; Olkhovskii, Ilia; Graf, Simon; Schmartz, Georges P; Hegemann, Julian D; Bozhüyük, Kenan A J; Müller, Rolf; Keller, Andreas; Gurevich, Alexey.

Nucleic Acids Res ; 52(D1): D579-D585, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-37994699

RESUMO

The human microbiome has emerged as a rich source of diverse and bioactive natural products, harboring immense potential for therapeutic applications. To facilitate systematic exploration and analysis of its biosynthetic landscape, we present ABC-HuMi: the Atlas of Biosynthetic Gene Clusters (BGCs) in the Human Microbiome. ABC-HuMi integrates data from major human microbiome sequence databases and provides an expansive repository of BGCs compared to the limited coverage offered by existing resources. Employing state-of-the-art BGC prediction and analysis tools, our database ensures accurate annotation and enhanced prediction capabilities. ABC-HuMi empowers researchers with advanced browsing, filtering, and search functionality, enabling efficient exploration of the resource. At present, ABC-HuMi boasts a catalog of 19 218 representative BGCs derived from the human gut, oral, skin, respiratory and urogenital systems. By capturing the intricate biosynthetic potential across diverse human body sites, our database fosters profound insights into the molecular repertoire encoded within the human microbiome and offers a comprehensive resource for the discovery and characterization of novel bioactive compounds. The database is freely accessible at https://www.ccb.uni-saarland.de/abc_humi/.

Assuntos

Vias Biossintéticas , Bases de Dados Genéticas , Microbiota , Família Multigênica , Humanos , Vias Biossintéticas/genética , Biologia Computacional/instrumentação , Internet , Microbiota/genética , Família Multigênica/genética , Metagenoma/genética

2.

vissE.cloud: a webserver to visualise higher order molecular phenotypes from enrichment analysis.

Mohamed, Ahmed; Bhuva, Dharmesh D; Lee, Sam; Liu, Ning; Tan, Chin Wee; Davis, Melissa J.

Nucleic Acids Res ; 51(W1): W593-W600, 2023 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-37158226

RESUMO

Gene-set analysis (GSA) dominates the functional interpretation of omics data and downstream hypothesis generation. Despite its ability to summarise thousands of measurements into semantically interpretable components, GSA often results in hundreds of significantly enriched gene-sets. However, summarisation and effective visualisation of GSA results to facilitate hypothesis generation is still lacking. While some webservers provide gene-set visualization tools, there is still a need for tools that can effectively summarize and guide exploration of GSA results. To enable versatility, webservers accept gene lists as input, however, none provide end-to-end solutions for emerging data types such as single-cell and spatial omics. Here, we present vissE.Cloud, a webserver for end-to-end gene-set analysis, offering gene-set summarisation and highly interactive visualisation. vissE.Cloud uses algorithms from our earlier R package vissE to summarise GSA results by identifying biological themes. We maintain versatility by allowing analysis of gene lists, as well as, analysis of raw single-cell and spatial omics data, including CosMx and Xenium data, making vissE.Cloud the first webserver to provide end-to-end gene-set analysis of sub-cellular localised spatial data. Structuring the results hierarchically allows swift interactive investigations of results at the gene, gene-set, and clusters level. vissE.Cloud is freely available at https://www.vissE.Cloud.

Assuntos

Biologia Computacional , Visualização de Dados , Software , Algoritmos , Fenótipo , Internet , Biologia Computacional/instrumentação , Biologia Computacional/métodos

3.

WebTetrado: a webserver to explore quadruplexes in nucleic acid 3D structures.

Adamczyk, Bartosz; Zurkowski, Michal; Szachniuk, Marta; Zok, Tomasz.

Nucleic Acids Res ; 51(W1): W607-W612, 2023 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-37158242

RESUMO

Quadruplexes are four-stranded DNA/RNA motifs of high functional significance that fold into complex shapes. They are widely recognized as important regulators of genomic processes and are among the most frequently investigated potential drug targets. Despite interest in quadruplexes, few studies focus on automatic tools that help to understand the many unique features of their 3D folds. In this paper, we introduce WebTetrado, a web server for analyzing 3D structures of quadruplex structures. It has a user-friendly interface and offers many advanced features, including automatic identification, annotation, classification, and visualization of the motif. The program applies to the experimental or in silico generated 3D models provided in the PDB and PDBx/mmCIF files. It supports canonical G-quadruplexes as well as non-G-based quartets. It can process unimolecular, bimolecular, and tetramolecular quadruplexes. WebTetrado is implemented as a publicly available web server with an intuitive interface and can be freely accessed at https://webtetrado.cs.put.poznan.pl/.

Assuntos

Biologia Computacional , Simulação por Computador , Visualização de Dados , Quadruplex G , Software , Conformação de Ácido Nucleico , Motivos de Nucleotídeos , Internet , Biologia Computacional/instrumentação , Biologia Computacional/métodos

4.

αCharges: partial atomic charges for AlphaFold structures in high quality.

Schindler, Ondrej; Berka, Karel; Cantara, Alessio; Krenek, Ales; Tichý, Dominik; Racek, Tomás; Svobodová, Radka.

Nucleic Acids Res ; 51(W1): W11-W16, 2023 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-37158246

RESUMO

The AlphaFold2 prediction algorithm opened up the possibility of exploring proteins' structural space at an unprecedented scale. Currently, >200 million protein structures predicted by this approach are deposited in AlphaFoldDB, covering entire proteomes of multiple organisms, including humans. Predicted structures are, however, stored without detailed functional annotations describing their chemical behaviour. Partial atomic charges, which map electron distribution over a molecule and provide a clue to its chemical reactivity, are an important example of such data. We introduce the web application αCharges: a tool for the quick calculation of partial atomic charges for protein structures from AlphaFoldDB. The charges are calculated by the recent empirical method SQE+qp, parameterised for this class of molecules using robust quantum mechanics charges (B3LYP/6-31G*/NPA) on PROPKA3 protonated structures. The computed partial atomic charges can be downloaded in common data formats or visualised via the powerful Mol* viewer. The αCharges application is freely available at https://alphacharges.ncbr.muni.cz with no login requirement.

Assuntos

Biologia Computacional , Proteínas , Software , Humanos , Algoritmos , Proteoma , Conformação Proteica , Proteínas/química , Biologia Computacional/instrumentação , Biologia Computacional/métodos

5.

IRSOM2: a web server for predicting bifunctional RNAs.

Postic, Guillaume; Tav, Christophe; Platon, Ludovic; Zehraoui, Farida; Tahi, Fariza.

Nucleic Acids Res ; 51(W1): W281-W288, 2023 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-37158254

RESUMO

Recent advances have shown that some biologically active non-coding RNAs (ncRNAs) are actually translated into polypeptides that have a physiological function as well. This paradigm shift requires adapted computational methods to predict this new class of 'bifunctional RNAs'. Previously, we developed IRSOM, an open-source algorithm to classify non-coding and coding RNAs. Here, we use the binary statistical model of IRSOM as a ternary classifier, called IRSOM2, to identify bifunctional RNAs as a rejection of the two other classes. We present its easy-to-use web interface, which allows users to perform predictions on large datasets of RNA sequences in a short time, to re-train the model with their own data, and to visualize and analyze the classification results thanks to the implementation of self-organizing maps (SOM). We also propose a new benchmark of experimentally validated RNAs that play both protein-coding and non-coding roles, in different organisms. Thus, IRSOM2 showed promising performance in detecting these bifunctional transcripts among ncRNAs of different types, such as circRNAs and lncRNAs (in particular those of shorter lengths). The web server is freely available on the EvryRNA platform: https://evryrna.ibisc.univ-evry.fr.

Assuntos

Algoritmos , Biologia Computacional , Simulação por Computador , RNA , RNA Longo não Codificante/química , Análise de Sequência de RNA/métodos , Biologia Computacional/instrumentação , Biologia Computacional/métodos , RNA/química , RNA/classificação , Internet

6.

GPS 6.0: an updated server for prediction of kinase-specific phosphorylation sites in proteins.

Chen, Miaomiao; Zhang, Weizhi; Gou, Yujie; Xu, Danyang; Wei, Yuxiang; Liu, Dan; Han, Cheng; Huang, Xinhe; Li, Chengzhi; Ning, Wanshan; Peng, Di; Xue, Yu.

Nucleic Acids Res ; 51(W1): W243-W250, 2023 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-37158278

RESUMO

Protein phosphorylation, catalyzed by protein kinases (PKs), is one of the most important post-translational modifications (PTMs), and involved in regulating almost all of biological processes. Here, we report an updated server, Group-based Prediction System (GPS) 6.0, for prediction of PK-specific phosphorylation sites (p-sites) in eukaryotes. First, we pre-trained a general model using penalized logistic regression (PLR), deep neural network (DNN), and Light Gradient Boosting Machine (LightGMB) on 490 762 non-redundant p-sites in 71 407 proteins. Then, transfer learning was conducted to obtain 577 PK-specific predictors at the group, family and single PK levels, using a well-curated data set of 30 043 known site-specific kinase-substrate relations in 7041 proteins. Together with the evolutionary information, GPS 6.0 could hierarchically predict PK-specific p-sites for 44046 PKs in 185 species. Besides the basic statistics, we also offered the knowledge from 22 public resources to annotate the prediction results, including the experimental evidence, physical interactions, sequence logos, and p-sites in sequences and 3D structures. The GPS 6.0 server is freely available at https://gps.biocuckoo.cn. We believe that GPS 6.0 could be a highly useful service for further analysis of phosphorylation.

Assuntos

Biologia Computacional , Proteínas , Software , Fosforilação , Proteínas Quinases/química , Proteínas Quinases/metabolismo , Processamento de Proteína Pós-Traducional , Proteínas/química , Proteínas/metabolismo , Biologia Computacional/instrumentação , Biologia Computacional/métodos , Internet

7.

OnTarget: in silico design of MiniPromoters for targeted delivery of expression.

Fornes, Oriol; Av-Shalom, Tamar V; Korecki, Andrea J; Farkas, Rachelle A; Arenillas, David J; Mathelier, Anthony; Simpson, Elizabeth M; Wasserman, Wyeth W.

Nucleic Acids Res ; 51(W1): W379-W386, 2023 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-37166953

RESUMO

MiniPromoters, or compact promoters, are short DNA sequences that can drive expression in specific cells and tissues. While broadly useful, they are of high relevance to gene therapy due to their role in enabling precise control of where a therapeutic gene will be expressed. Here, we present OnTarget (http://ontarget.cmmt.ubc.ca), a webserver that streamlines the MiniPromoter design process. Users only need to specify a gene of interest or custom genomic coordinates on which to focus the identification of promoters and enhancers, and can also provide relevant cell-type-specific genomic evidence (e.g. accessible chromatin regions, histone modifications, etc.). OnTarget combines the provided data with internal data to identify candidate promoters and enhancers and design MiniPromoters. To illustrate the utility of OnTarget, we designed and characterized two MiniPromoters targeting different cell populations relevant to Parkinson Disease.

Assuntos

Biologia Computacional , Simulação por Computador , Regiões Promotoras Genéticas , Software , Elementos Facilitadores Genéticos/genética , Genoma , Genômica , Regiões Promotoras Genéticas/genética , Internet , Biologia Computacional/instrumentação , Biologia Computacional/métodos

8.

g:Profiler-interoperable web service for functional enrichment analysis and gene identifier mapping (2023 update).

Kolberg, Liis; Raudvere, Uku; Kuzmin, Ivan; Adler, Priit; Vilo, Jaak; Peterson, Hedi.

Nucleic Acids Res ; 51(W1): W207-W212, 2023 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-37144459

RESUMO

g:Profiler is a reliable and up-to-date functional enrichment analysis tool that supports various evidence types, identifier types and organisms. The toolset integrates many databases, including Gene Ontology, KEGG and TRANSFAC, to provide a comprehensive and in-depth analysis of gene lists. It also provides interactive and intuitive user interfaces and supports ordered queries and custom statistical backgrounds, among other settings. g:Profiler provides multiple programmatic interfaces to access its functionality. These can be easily integrated into custom workflows and external tools, making them valuable resources for researchers who want to develop their own solutions. g:Profiler has been available since 2007 and is used to analyse millions of queries. Research reproducibility and transparency are achieved by maintaining working versions of all past database releases since 2015. g:Profiler supports 849 species, including vertebrates, plants, fungi, insects and parasites, and can analyse any organism through user-uploaded custom annotation files. In this update article, we introduce a novel filtering method highlighting Gene Ontology driver terms, accompanied by new graph visualizations providing a broader context for significant Gene Ontology terms. As a leading enrichment analysis and gene list interoperability service, g:Profiler offers a valuable resource for genetics, biology and medical researchers. It is freely accessible at https://biit.cs.ut.ee/gprofiler.

Assuntos

Mapeamento Cromossômico , Biologia Computacional , Genes , Software , Animais , Mapeamento Cromossômico/instrumentação , Mapeamento Cromossômico/métodos , Bases de Dados Genéticas , Internet , Reprodutibilidade dos Testes , Interface Usuário-Computador , Biologia Computacional/instrumentação , Biologia Computacional/métodos , Genes/genética , Humanos

9.

KVFinder-web: a web-based application for detecting and characterizing biomolecular cavities.

Guerra, João V S; Ribeiro-Filho, Helder V; Pereira, José G C; Lopes-de-Oliveira, Paulo S.

Nucleic Acids Res ; 51(W1): W289-W297, 2023 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-37140050

RESUMO

Molecular interactions that modulate catalytic processes occur mainly in cavities throughout the molecular surface. Such interactions occur with specific small molecules due to geometric and physicochemical complementarity with the receptor. In this scenario, we present KVFinder-web, an open-source web-based application of parKVFinder software for cavity detection and characterization of biomolecular structures. The KVFinder-web has two independent components: a RESTful web service and a web graphical portal. Our web service, KVFinder-web service, handles client requests, manages accepted jobs, and performs cavity detection and characterization on accepted jobs. Our graphical web portal, KVFinder-web portal, provides a simple and straightforward page for cavity analysis, which customizes detection parameters, submits jobs to the web service component, and displays cavities and characterizations. We provide a publicly available KVFinder-web at https://kvfinder-web.cnpem.br, running in a cloud environment as docker containers. Further, this deployment type allows KVFinder-web components to be configured locally and customized according to user demand. Hence, users may run jobs on a locally configured service or our public KVFinder-web.

Assuntos

Biologia Computacional , Software , Biologia Computacional/instrumentação , Biologia Computacional/métodos , Internet , Interface Usuário-Computador

10.

The LightDock Server: Artificial Intelligence-powered modeling of macromolecular interactions.

Jiménez-García, Brian; Roel-Touris, Jorge; Barradas-Bautista, Didier.

Nucleic Acids Res ; 51(W1): W298-W304, 2023 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-37140054

RESUMO

Computational docking is an instrumental method of the structural biology toolbox. Specifically, integrative modeling software, such as LightDock, arise as complementary and synergetic methods to experimental structural biology techniques. Ubiquitousness and accessibility are fundamental features to promote ease of use and to improve user experience. With this goal in mind, we have developed the LightDock Server, a web server for the integrative modeling of macromolecular interactions, along with several dedicated usage modes. The server builds upon the LightDock macromolecular docking framework, which has proved useful for modeling medium-to-high flexible complexes, antibody-antigen interactions, or membrane-associated protein assemblies. We believe that this free-to-use resource will be a valuable addition to the structural biology community and can be accessed online at: https://server.lightdock.org/.

Assuntos

Inteligência Artificial , Biologia Computacional , Substâncias Macromoleculares , Simulação de Acoplamento Molecular , Biologia Computacional/instrumentação , Biologia Computacional/métodos , Internet , Substâncias Macromoleculares/química , Software

11.

DEPICTER2: a comprehensive webserver for intrinsic disorder and disorder function prediction.

Basu, Sushmita; Gsponer, Jörg; Kurgan, Lukasz.

Nucleic Acids Res ; 51(W1): W141-W147, 2023 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-37140058

RESUMO

Intrinsic disorder in proteins is relatively abundant in nature and essential for a broad spectrum of cellular functions. While disorder can be accurately predicted from protein sequences, as it was empirically demonstrated in recent community-organized assessments, it is rather challenging to collect and compile a comprehensive prediction that covers multiple disorder functions. To this end, we introduce the DEPICTER2 (DisorderEd PredictIon CenTER) webserver that offers convenient access to a curated collection of fast and accurate disorder and disorder function predictors. This server includes a state-of-the-art disorder predictor, flDPnn, and five modern methods that cover all currently predictable disorder functions: disordered linkers and protein, peptide, DNA, RNA and lipid binding. DEPICTER2 allows selection of any combination of the six methods, batch predictions of up to 25 proteins per request and provides interactive visualization of the resulting predictions. The webserver is freely available at http://biomine.cs.vcu.edu/servers/DEPICTER2/.

Assuntos

Biologia Computacional , Visualização de Dados , Internet , Proteínas , Biologia Computacional/instrumentação , Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Ligação Proteica , Interface Usuário-Computador

12.

ViralFlow: A Versatile Automated Workflow for SARS-CoV-2 Genome Assembly, Lineage Assignment, Mutations and Intrahost Variant Detection.

Dezordi, Filipe Zimmer; Neto, Antonio Marinho da Silva; Campos, Túlio de Lima; Jeronimo, Pedro Miguel Carneiro; Aksenen, Cleber Furtado; Almeida, Suzana Porto; Wallau, Gabriel Luz.

Viruses ; 14(2)2022 01 23.

Artigo em Inglês | MEDLINE | ID: mdl-35215811

RESUMO

The COVID-19 pandemic is driven by Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2) that emerged in 2019 and quickly spread worldwide. Genomic surveillance has become the gold standard methodology used to monitor and study this fast-spreading virus and its constantly emerging lineages. The current deluge of SARS-CoV-2 genomic data generated worldwide has put additional pressure on the urgent need for streamlined bioinformatics workflows. Here, we describe a workflow developed by our group to process and analyze large-scale SARS-CoV-2 Illumina amplicon sequencing data. This workflow automates all steps of SARS-CoV-2 reference-based genomic analysis: data processing, genome assembly, PANGO lineage assignment, mutation analysis and the screening of intrahost variants. The pipeline is capable of processing a batch of around 100 samples in less than half an hour on a personal laptop or in less than five minutes on a server with 50 threads. The workflow presented here is available through Docker or Singularity images, allowing for implementation on laptops for small-scale analyses or on high processing capacity servers or clusters. Moreover, the low requirements for memory and CPU cores and the standardized results provided by ViralFlow highlight it as a versatile tool for SARS-CoV-2 genomic analysis.

Assuntos

Automação Laboratorial/métodos , Genoma Viral , Mutação , SARS-CoV-2/classificação , SARS-CoV-2/genética , Fluxo de Trabalho , Biologia Computacional/instrumentação , Biologia Computacional/métodos , Genômica/instrumentação , Genômica/métodos , Humanos , Filogenia , Glicoproteína da Espícula de Coronavírus/genética , Montagem de Vírus/genética

13.

SBSA: an online service for somatic binding sequence annotation.

Jiang, Limin; Guo, Fei; Tang, Jijun; Yu, Hui; Ness, Scott; Duan, Mingrui; Mao, Peng; Zhao, Ying-Yong; Guo, Yan.

Nucleic Acids Res ; 50(1): e4, 2022 01 11.

Artigo em Inglês | MEDLINE | ID: mdl-34606615

RESUMO

Efficient annotation of alterations in binding sequences of molecular regulators can help identify novel candidates for mechanisms study and offer original therapeutic hypotheses. In this work, we developed Somatic Binding Sequence Annotator (SBSA) as a full-capacity online tool to annotate altered binding motifs/sequences, addressing diverse types of genomic variants and molecular regulators. The genomic variants can be somatic mutation, single nucleotide polymorphism, RNA editing, etc. The binding motifs/sequences involve transcription factors (TFs), RNA-binding proteins, miRNA seeds, miRNA-mRNA 3'-UTR binding target, or can be any custom motifs/sequences. Compared to similar tools, SBSA is the first to support miRNA seeds and miRNA-mRNA 3'-UTR binding target, and it unprecedentedly implements a personalized genome approach that accommodates joint adjacent variants. SBSA is empowered to support an indefinite species, including preloaded reference genomes for SARS-Cov-2 and 25 other common organisms. We demonstrated SBSA by annotating multi-omics data from over 30,890 human subjects. Of the millions of somatic binding sequences identified, many are with known severe biological repercussions, such as the somatic mutation in TERT promoter region which causes a gained binding sequence for E26 transformation-specific factor (ETS1). We further validated the function of this TERT mutation using experimental data in cancer cells. Availability:http://innovebioinfo.com/Annotation/SBSA/SBSA.php.

Assuntos

COVID-19/virologia , Biologia Computacional/instrumentação , Genômica/instrumentação , Mutação , Proteômica/instrumentação , SARS-CoV-2 , Regiões 3' não Traduzidas , Algoritmos , Motivos de Aminoácidos , COVID-19/metabolismo , Biologia Computacional/métodos , Computadores , Técnicas Genéticas , Genoma Humano , Genômica/métodos , Humanos , Internet , MicroRNAs/metabolismo , Fenótipo , Regiões Promotoras Genéticas , Ligação Proteica , Proteômica/métodos , Proteína Proto-Oncogênica c-ets-1/genética , Proteína Proto-Oncogênica c-ets-1/metabolismo , Proteínas de Ligação a RNA/metabolismo , Telomerase/metabolismo

14.

sideRETRO: uma ferramenta de bioinformática dedicada à identificação deinserções polimórficas, germinativas ou somáticas, de pseudogenes processados / sideRETRO: a bioinformatics tool for identifying somatic and polymorphic insertions of processed pseudogenes

Miller, Thiago Luiz Araujo.

São Paulo; s.n; s.n; 2022. 186 p. tab, graf, ilus.

Tese em Português | LILACS | ID: biblio-1397348

RESUMO

Os avanços metodológicos e instrumentais decorrentes do Projeto Genoma Humano formaram o arcabouço necessário para o surgimento das tecnologias de sequenciamento de DNA de Nova Geração, as quais se caracterizam por um custo reduzido, uma baixa demanda operacional e a produção de um grande volume de dados por experimento. Concomitantemente a isso, o aumento no poder de processamento computacional permitiu o desenvolvimento de análises genéticas em larga escala, de modo que, atualmente, é possível estudar características genômicas individualizadas e, até então, pouco ou nunca exploradas. Dentre essas características, aquelas relacionadas às variações estruturais em genomas têm recebido bastante atenção. Os pseudogenes processados, ou retrocópias, são variações estruturais causadas pela duplicação de genes codificadores mediante à transposição de seu RNA mensageiro maduro pela maquinaria enzimática de LINE- 1. As retrocópias podem estar fixadas, ou seja, presentes em todos os genomas de uma dada espécie, os quais são representados pela montagem modelo do genoma de referência, ou podem não estar fixadas, sendo polimórficas, germinativas ou somáticas. No entanto, o conhecimento acerca das retrocópias não fixadas ainda é limitado devido à falta de ferramentas de bioinformática dedicadas a sua identificação e anotação em dados de sequenciamento de DNA. Posto isso, este trabalho apresenta o sideRETRO um programa computacional especializado na detecção de pseudogenes processados ausentes do genoma de referência, mas presentes em dados de sequenciamento de genoma completo e exoma de outros indivíduos. Além de apontar para a presença de retrocópias não fixadas, o sideRETRO é capaz de anotar várias outras características relacionadas a esses evento, tais como: a coordenada genômica de inserção do pseudogene processado, a qual constitui o cromossomo, o ponto de inserção e a fita de DNA (líder or retardada); o contexto genômico do evento (exônico, intrônico ou intergênico); a genotipagem (presente ou ausente) e a haplotipagem (em homozigose ou heterozigose). Para atestar a eficiência da ferramenta, o sideRETRO foi executado para dados simulados e para dados reais validados experimentalmente por um grupo independente. Portanto, em resumo, nesta tese são descritos o desenvolvimento e o uso do sideRETRO uma ferramenta computacional robusta e eficiente, designada para identificar e anotar pseudogenes processados não fixados. Por fim, vale destacar que o sideRETRO preenche uma lacuna metodológica e possibilita novas hipóteses e investigações sistemáticas no campo de chamada de variantes estruturais

The methodological and instrumental advances resulting from the Human Genome Project have created the necessary framework to the emergence of Next Generation DNA sequencing technologies, which are characterized by a reduced cost, low operational demand and the generation of a large volume of data per experiment. Concomitantly with this, the increase in computational processing power has driven the development of large-scale genetic analyses, which allowed us to study individualized genomic traits little or never explored before. Among these characteristics, those related to structural variations in genomes have received much attention. Processed pseudogenes, or retrocopies, are structural variations caused by the duplication of coding genes through the transposition of their mature messenger RNA by the LINE-1 enzymatic machinery. Retrocopies can be fixed (i.e., present in all genomes of a given species and included into the assembly of the reference genome) or unfixed, being polymorphic, germinal or somatic. However, knowledge about unfixed retrocopies is still limited due to the lack of bioinformatics tools dedicated to their identification and annotation in DNA sequencing data. Therefore, this work presents sideRETRO a computer program specialized in the detection of processed pseudogenes absent from the reference genome, but present in whole genome and exome sequencing data from other individuals. In addition to pointing out the presence of unfixed retrocopies, sideRETRO is able to annotate several other characteristics related to these events, such as: the genomic coordinate of the processed pseudogene insetion, which constitutes the chromosome, the insertion point and the DNA strand (leader or retard); the genomic context of the event (exonic, intronic or intergenic); genotyping (present or absent) and haplotyping (homozygous or heterozygous). To certify the sideRETRO efficiency, it was run on simulated data and on real data experimentally validated by an independent group. Therefore, in summary, this thesis describes the development and use of sideRETRO a robust and efficient computational tool, designed to identify and annotate unfixed processed pseudogenes. Finally, it is worth noting that sideRETRO fills a methodological gap and allows new hypotheses and systematic investigations in the field of structural variant calling

Assuntos

Polimorfismo Genético/genética , Biologia Computacional/classificação , Biologia Computacional/instrumentação , Custos e Análise de Custo , Genômica/instrumentação , Análise de Sequência de DNA/instrumentação , Codificação Clínica

15.

Author-sourced capture of pathway knowledge in computable form using Biofactoid.

Wong, Jeffrey V; Franz, Max; Siper, Metin Can; Fong, Dylan; Durupinar, Funda; Dallago, Christian; Luna, Augustin; Giorgi, John; Rodchenkov, Igor; Babur, Özgün; Bachman, John A; Gyori, Benjamin M; Demir, Emek; Bader, Gary D; Sander, Chris.

Elife ; 102021 12 03.

Artigo em Inglês | MEDLINE | ID: mdl-34860157

RESUMO

Making the knowledge contained in scientific papers machine-readable and formally computable would allow researchers to take full advantage of this information by enabling integration with other knowledge sources to support data analysis and interpretation. Here we describe Biofactoid, a web-based platform that allows scientists to specify networks of interactions between genes, their products, and chemical compounds, and then translates this information into a representation suitable for computational analysis, search and discovery. We also report the results of a pilot study to encourage the wide adoption of Biofactoid by the scientific community.

Assuntos

Biologia Computacional/métodos , Genômica/métodos , Biologia Computacional/instrumentação , Bases de Dados Factuais , Genômica/instrumentação , Projetos Piloto

16.

Single-cell metabolomics hits its stride.

Seydel, Caroline.

Nat Methods ; 18(12): 1452-1456, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34862499

Assuntos

Biologia Computacional/instrumentação , Biologia Computacional/métodos , Metabolômica/métodos , Metabolômica/tendências , Software , Animais , Aplysia , Astrócitos/citologia , Neoplasias da Mama/diagnóstico por imagem , Diferenciação Celular , Humanos , Sistema Imunitário , Lipídeos/química , Lisossomos/metabolismo , Espectrometria de Massas , Metaboloma , Neoplasias/diagnóstico por imagem , Organoides , Fenótipo , Medicina de Precisão , Proteômica , Transdução de Sinais , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , Microambiente Tumoral

17.

LGFC-CNN: Prediction of lncRNA-Protein Interactions by Using Multiple Types of Features through Deep Learning.

Huang, Lan; Jiao, Shaoqing; Yang, Sen; Zhang, Shuangquan; Zhu, Xiaopeng; Guo, Rui; Wang, Yan.

Genes (Basel) ; 12(11)2021 10 24.

Artigo em Inglês | MEDLINE | ID: mdl-34828296

RESUMO

Long noncoding RNA (lncRNA) plays a crucial role in many critical biological processes and participates in complex human diseases through interaction with proteins. Considering that identifying lncRNA-protein interactions through experimental methods is expensive and time-consuming, we propose a novel method based on deep learning that combines raw sequence composition features, hand-designed features and structure features, called LGFC-CNN, to predict lncRNA-protein interactions. The two sequence preprocessing methods and CNN modules (GloCNN and LocCNN) are utilized to extract the raw sequence global and local features. Meanwhile, we select hand-designed features by comparing the predictive effect of different lncRNA and protein features combinations. Furthermore, we obtain the structure features and unifying the dimensions through Fourier transform. In the end, the four types of features are integrated to comprehensively predict the lncRNA-protein interactions. Compared with other state-of-the-art methods on three lncRNA-protein interaction datasets, LGFC-CNN achieves the best performance with an accuracy of 94.14%, on RPI21850; an accuracy of 92.94%, on RPI7317; and an accuracy of 98.19% on RPI1847. The results show that our LGFC-CNN can effectively predict the lncRNA-protein interactions by combining raw sequence composition features, hand-designed features and structure features.

Assuntos

Aprendizado Profundo , Redes Reguladoras de Genes/fisiologia , Mapas de Interação de Proteínas/fisiologia , RNA Longo não Codificante/metabolismo , Proteínas de Ligação a RNA/metabolismo , Animais , Biologia Computacional/instrumentação , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Humanos , Redes Neurais de Computação , RNA Longo não Codificante/genética , Proteínas de Ligação a RNA/genética

18.

OME-NGFF: a next-generation file format for expanding bioimaging data-access strategies.

Moore, Josh; Allan, Chris; Besson, Sébastien; Burel, Jean-Marie; Diel, Erin; Gault, David; Kozlowski, Kevin; Lindner, Dominik; Linkert, Melissa; Manz, Trevor; Moore, Will; Pape, Constantin; Tischer, Christian; Swedlow, Jason R.

Nat Methods ; 18(12): 1496-1498, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34845388

RESUMO

The rapid pace of innovation in biological imaging and the diversity of its applications have prevented the establishment of a community-agreed standardized data format. We propose that complementing established open formats such as OME-TIFF and HDF5 with a next-generation file format such as Zarr will satisfy the majority of use cases in bioimaging. Critically, a common metadata format used in all these vessels can deliver truly findable, accessible, interoperable and reusable bioimaging data.

Assuntos

Biologia Computacional/instrumentação , Biologia Computacional/normas , Metadados , Microscopia/instrumentação , Microscopia/normas , Software , Benchmarking , Biologia Computacional/métodos , Compressão de Dados , Bases de Dados Factuais , Armazenamento e Recuperação da Informação , Internet , Microscopia/métodos , Linguagens de Programação , SARS-CoV-2

19.

COVIDrugNet: a network-based web tool to investigate the drugs currently in clinical trial to contrast COVID-19.

Menestrina, Luca; Cabrelle, Chiara; Recanatini, Maurizio.

Sci Rep ; 11(1): 19426, 2021 09 30.

Artigo em Inglês | MEDLINE | ID: mdl-34593915

RESUMO

The COVID-19 pandemic poses a huge problem of public health that requires the implementation of all available means to contrast it, and drugs are one of them. In this context, we observed an unmet need of depicting the continuously evolving scenario of the ongoing drug clinical trials through an easy-to-use, freely accessible online tool. Starting from this consideration, we developed COVIDrugNet ( http://compmedchem.unibo.it/covidrugnet ), a web application that allows users to capture a holistic view and keep up to date on how the clinical drug research is responding to the SARS-CoV-2 infection. Here, we describe the web app and show through some examples how one can explore the whole landscape of medicines in clinical trial for the treatment of COVID-19 and try to probe the consistency of the current approaches with the available biological and pharmacological evidence. We conclude that careful analyses of the COVID-19 drug-target system based on COVIDrugNet can help to understand the biological implications of the proposed drug options, and eventually improve the search for more effective therapies.

Assuntos

Tratamento Farmacológico da COVID-19 , Biologia Computacional/métodos , Ensaios Clínicos como Assunto , Biologia Computacional/instrumentação , Bases de Dados de Produtos Farmacêuticos , Reposicionamento de Medicamentos , Humanos , Internet , Proteínas Virais/metabolismo

20.

Differentiable biology: using deep learning for biophysics-based and data-driven modeling of molecular mechanisms.

AlQuraishi, Mohammed; Sorger, Peter K.

Nat Methods ; 18(10): 1169-1180, 2021 10.

Artigo em Inglês | MEDLINE | ID: mdl-34608321

RESUMO

Deep learning using neural networks relies on a class of machine-learnable models constructed using 'differentiable programs'. These programs can combine mathematical equations specific to a particular domain of natural science with general-purpose, machine-learnable components trained on experimental data. Such programs are having a growing impact on molecular and cellular biology. In this Perspective, we describe an emerging 'differentiable biology' in which phenomena ranging from the small and specific (for example, one experimental assay) to the broad and complex (for example, protein folding) can be modeled effectively and efficiently, often by exploiting knowledge about basic natural phenomena to overcome the limitations of sparse, incomplete and noisy data. By distilling differentiable biology into a small set of conceptual primitives and illustrative vignettes, we show how it can help to address long-standing challenges in integrating multimodal data from diverse experiments across biological scales. This promises to benefit fields as diverse as biophysics and functional genomics.

Assuntos

Biofísica/métodos , Biologia Computacional/instrumentação , Biologia Computacional/métodos , Aprendizado Profundo , Redes Neurais de Computação , Química Computacional , Modelos Químicos , Reconhecimento Automatizado de Padrão , Conformação Proteica , Proteínas/química

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA